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EXPRESSION OF SELF -PROCESSING 
POLYPROTEINS IN TRANSGENIC PLANTS 



This invention relates generally to the expression of 
self -processing polyproteins in transgenic plants. 

The relative levels of expression of several 
introduced genes in transgenic plants is notoriously 
influenced by "position effects" determined by the 
particular site of transgene integration into the genome. 
Even when introduced genes are linked on the same T-DNA, 
driven either by convergent or divergent promoters, they 
are usually not co-ordinately expressed at similar 
levels. This poses particular problems when high levels 
of expression of a number of introduced activities is 
required, for instance when attempting to express novel 
biochemical pathways in plants. In an attempt to achieve 
tissue specific, co-ordinated expression of two proteins, 
other researchers have linked genes by co- transference on 
the same T-DNA. The expression levels of linked nopaline 
synthase (nos) and octopine synthase (ocs) genes and 
closely adjacent neomycin phosphotransferase II (NPTII) 
and chloramphenicol acetyl transferase (CAT) reporter 
genes were found to vary independently. Another strategy 
was to link genes via adjacent and divergent promoters. 
Whereas co-ordinated expression of the Cab22L and Cab22R 
genes of petunia was achieved, a similar approach using 
the CAT and GUS genes produced a high degree of variation 
of CAT and GUS activities within individual transgenes . 
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Linking proteins in the form of polyproteins is a 
strategy adopted in the replication of many viruses. On 
translation, virus -encoded proteinases mediate extremely 
rapid intramolecular (cis) cleavages of the polyprotein 
5 to yield discrete protein products. 

A number of viral proteinases have been partially 
characterised and are thought to be related, both 
structurally and in catalytic mechanism, to cellular 
proteinases. In the picornaviridae a single long open 
10 reading frame encodes a polyprotein of some 225kD, but 

full-length translation products are not normally 
observed due to extremely rapid "primary" intramolecular 
(cis) cleavages mediated by virus encoded proteinases. In 
the case of the entero- and rhinoviruses , a primary 
15 cleavage occurs between the PI capsid protein precursor 

and the replicative domains of the polyprotein (P2, P3 ; 
Figure 1,A) . This cleavage is mediated by a virus encoded 
proteinase (2A pro ) , of some 17kDa, cleaving at its own N- 
terminus . 

20 The aphtho-, or Foot -and- Mouth Disease (FMD) , 

viruses form a distinct group within the picornaviridae. 
FMDV polyprotein undergoes a primary polyprotein cleavage 
at the C-terminus of the 2A region between the capsid 
protein precursor (P1-2A) and replicative domains of the 
25 polyprotein 2BC and P3 (Figure 1,B) . Precursors spanning 

the 2A/2B cleavage site are not detected during native 
polyprotein processing. This situation is somewhat 
analogous to the 2A cleavage in entero- and rhinoviruses 
described above. However the 2A region of the FMDV 
3 0 polyprotein was demonstrated to be only 16 amino acids 

long (Figure 1,C) . The predicted amino acid sequence of 
this region, is totally conserved amongst all aphthovirus 
genomic RNAs sequenced to date. The FMDV 2A region also 
shows high similarity to the C-terminal region of the 
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approximately ten fold larger 2A protein of another genus 
of the picornaviridae , the cardioviruses . 

According to the present invention, there is provided 
a method for the expression of multiple proteins in a 
transgenic plant comprising inserting into the genome of 
the plant a gene construct comprising a 5' -region which 
includes a promoter which is capable of initiating 
transcription of a structural gene under the control 
thereof, a protein encoding sequence coding for more than 
one protein and a 3 ' -terminator region which includes a 
polyadenylation signal, each of the said protein encoding 
sequences being separated from an adjacent protein 
encoding sequence by a DNA sequence which on translation 
provides a cleavage site whereby the expressed 
polyprotein is post-translationally processed into the 
component protein molecules. 

Preferably the DNA sequence which encodes the post- 
translation cleavage site is derived from a virus, 
particularly a picornavirus . 

Preferably also the DNA sequence providing the 
cleavage site encodes the amino acid sequence 
NFDLLKLAGDVESNPGPFF [SEQ ID N0.1] . However, it is well 
known that variations may be made in amino acid sequences 
which do not greatly affect function and it is intended 
that such variants of the said sequence and the 
nucleotide which encodes it are within the scope of this 
invention . 

Thus multiple genes are inserted into a plant genome 
under the control of a single promoter, in the form of a 
self -processing polyprotein. 

We have found that FMDV 2A represents a polyprotein 
region whose sole function in the viral genome is to 
effect or direct a single cleavage at its own C-terminus, 
functioning only in cis. Due to its small size, FMDV 2A 
(and related sequences or derivatives) presents an ideal 
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candidate for engineering into plant polyprotein 
expression vectors . 

The inclusion of such proteinase or cleavage 
sequences in plant transformation constructs enables the 
5 expression from a single promoter of multiple introduced 

proteins, initially linked as a polyprotein, in plant 
cells and plants. 

Initial characterisation of the FMDV 2A region, using 
a series of recombinant FMDV polyproteins , showed that 
10 the FMDV 2A/2B cleavage activity was mediated by residues 

located within the 19 amino acid sequence 
NFDLLKIAGDVESNPGPFF [SEQ ID NO . 1] spanning the FMDV 2A 
region (Ryan et al, 1991) . It was found that FMDV 2A does 
not act as a substrate for a proteinase located elsewhere 
15 within the FMDV polyprotein or absolutely require 

particular FMDV domains for activity. Three explanations 
may account for a co- translational cleavage associated 
with such a short sequence; (i) FMDV 2A functions as a 
substrate for a cellular proteinase, which, to account 
20 for the observed cleavage kinetics, would need to be 

closely coupled to translation, (ii) the FMDV 2A sequence 
in some manner disrupts the normal peptide bond formation 
during translation, or (iii) the FMD 2A sequence 
possesses an entirely novel type of proteolytic activity. 
25 These data do not, however, clarify to what extent the 2A 

region could function independently of the physical 
environment provided by FMDV polyprotein sequences. To 
address this question we have studied proteolysis 
associated with FMDV 2A in a completely foreign context, 
30 that of a synthetic polyprotein designed such that two 

reporter genes flank sequences from the FMDV 2A region of 
the polyprotein. 

We have constructed a plasmid (pCAT2AGUS) in which 
the 20 amino acid sequence spanning FMDV 2A was inserted 
3 5 between the reporter genes chloramphenicol acetyl 
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transferase (CAT) and S-glucuronidase (GUS) maintaining a 
single, long, open reading frame (Figure 2) . Translation 
studies were performed in three systems; (i) a coupled 
transcription / translation (TnT) rabbit reticulocyte 
system, (ii) a wheat germ lysate and (iii) a human cell 
line (HTK-143) infected with the recombinant vaccinia 
virus VTF7-3 expressing T7 RNA polymerase. 

Translation directed by pCAT2AGUS showed three major 
translation products in all translation systems. The 
uppermost band corresponded to the expected [CAT2AGUS] 
polyprotein translation product and was 

immunoprecipitated by anti-CAT and anti-GUS antibodies. 
The second band co-migrated with GUS and was 
immunoprecipitated only by antibodies directed against 
GUS and corresponded to a GUS cleavage product. The lower 
band migrated somewhat more slowly than CAT and was 
immunoprecipitated by anti-CAT and anti-2A antibodies but 
not anti-GUS antibodies and corresponded to the [CAT2A] 
cleavage product. Densitometric analysis determined that 
80% of the translated polyprotein had undergone cleavage 
to [CAT2A] and [GUS] . Assays provided evidence that both 
cleaved proteins were enzymically active. 

These analyses show that the inserted FMDV sequence 
can function in a manner similar to that observed in FMDV 
polyprotein processing: the [CAT2AGUS] polyprotein 
undergoes a co- translational , apparently autoproteolytic, 
cleavage yielding [CAT -2 A] and GUS. It is clear that the 
20 amino acid FMDV 2A-spanning sequence does not require 
other domains within the FMDV polyprotein to function and 
is an autonomous element capable of mediating cleavage - 
even in a completely foreign context. Our analysis of N- 
terminally truncated forms of FMDV 2A show that the 13 
residue oligopeptide ( - LKLAGDVESNPGP - [SEQ ID NO. 2]) is 
able to mediate cleavage whereas the 11 residue 
oligopeptide ( -LAGDVESNPGP- [SEQ ID NO. 3]) is not. However 
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the final proline residue, which comes after the actual 
2A cleavage site and constitutes the amino- terminal 
residue of 2B in FMDV, is necessary for cleavage. In 
addition we have evidence that an oligopeptide comprising 
the C- terminal 14 residues of the EMC 2A protein together 
with the N-terminal proline residue of EMC protein 2B can 
also mediate cleavage in a foreign context, although to a 
lower level than FMD 2A We are currently determining by 
mutagenesis the exact amino acid sequences which permit 
cleavage in an attempt to optimise the reaction. 

FMDV 2A together with the N-terminal proline residue 
of protein 2B may represent an entirely novel type of 
protein cleavage activity. Although. the mechanism of 
cleavage is not yet understood, its utility is apparent. 
Because of its small size the FMDV 2A sequence or 
derivatives of it are particularly attractive candidates 
for use in plant polyprotein expression constructs 
enabling the co-ordinated and stoichiometric expression 
of multiple proteins from a single open reading frame. 

In the model [CAT2AGUS] construct described above, 
the 2 0 amino acid 2A-spanning sequence which remains 
attached to CAT after cleavage from GUS, is sufficiently 
short that CAT activity is not impaired. We have also 
found no impairment of activity in similar constructs 
where 2A is attached to the carboxy terminus of 
glutathione-S-transf erase or T7 RNA polymerase. 

We have introduced the CAT2AGUS construct into 
tobacco plants and will assess the efficiency of cleavage 
of [CAT2A] from [GUS] in this system. However since the 
cleavage has been effective in all systems tested to 
date, including wheatgerm lysates, efficient cleavage is 
expected. The FMDV 2A sequence (plus the requisite 
carboxy- terminal proline) inserted between protein coding 
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sequences, may allow the engineering of plant expression 
vectors where multiple whole proteins or protein domains 
can be expressed as a polyprotein and cleaved apart co- 
translationally with high efficiency. This may enable the 
rapid introduction of entire enzyme cascades into plants. 

The invention will now be described with reference to 
the accompanying Figures of which: 

Figure 1 shows: Picornavirus Primary Polyprotein 
Cleavages. The 5' non-coding region is capped by a small 
protein VPg (or 3B) . The single long open reading frame 
and polyprotein organisation is shown (boxed areas) for 
both entero- and rhinovirus groups (panel A) and 
aphthoviruses (panel B) . Arrows indicate sites of primary 
cleavage and the virus -encoded proteinases responsible, 
where known. Primary cleavage products are shown below. 
The amino acid sequence spanning the aphthovirus 2A 
region of the polyprotein is shown, the 2A oligopeptide 
being cleaved from the capsid protein ID by 3C pro in an 
intermolecular reaction occuring at a later stage of 
polyprotein processing (panel C) ; 

Figure 2 shows: CAT/GUS Constructs. Boxed areas 
represent the single open reading frames encoding either 
individual proteins (CAT; pCAT20/21, GUS; pGUS12/23) or 
the artificial polyproteins [ CATGUS ] and [CAT2AGUS] . All 
plasmids were based on pGEM transcription vectors. 

Figure 3 shows the translation products of pGUS, 
pCATGUS, pCAT2AGUS and pCAT; and, 

Figure 4 shows CAT and GUS activity in transgenic 
plants transformed with pCAT2AGUS . 

EXAMPLE 1 Construction of CAT and GUS expression 
vectors . 

The reporter genes CAT and GUS were amplified by PCR 
using oligonucleotide primers such that restriction sites 
were created at both termini. Individual genes were 
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cloned into pGEM transcription vectors (pCAT; pGUS) and 
also assembled together (pCATGUS) to produce a single 
open reading frame encoding the artificial polyprotein 
[CATGUS] (Figure 2) . Coding sequences from the FMDV 2A 
region were assembled in the plasmid vector pGEM 7zf ( + ) 
in such a way that a series of unique restriction sites 
were created throughout the sequence. The FMDV 2A 
sequence was excised and inserted between the CAT and GUS 
genes of pCATGUS to retain a single open reading frame 
and form construct pCAT2AGUS (Figure 2) . 

EXAMPLE 2 Expression of CAT and GUS constructs in wheat - 
germ lvsate. 

Translation studies were performed in a coupled 
transcription/ translation (TnT) wheat germ system using 
T7 polymerase. Translation directed by plasmids pCAT , 
pGUS and pCATGUS yielded polypeptides of the expected 
molecular weight for CAT (25.7kDa), GUS (70.4kDa) and the 
polyprotein [CATGUS] (96.3kDa) respectively (Figure 3, 
lanes 1, 4 and 2) . Translation directed by pCAT2AGUS 
yielded two major products at 70kDa and 26kDa (lane 3) . 
The 70kDa polypeptide migrated identically to the product 
of pGUS shown in lane 1 and represents GUS polypeptide 
processed from the [CAT2AGUS] polyprotein during 
translation. The 26kDa band migrated slightly more slowly 
than the product of pCAT shown in lane 4 and corresponds 
to the [CAT- 2 A] cleavage product. A third fainter band 
appeared at approx. 9€kDa (lane 3) and co-migrated with 
the product of pCATGUS (lane 2). This corresponds to the 
entire polyprotein product of pCAT2AGUS and suggests that 
a small proportion of the polyprotein is not processed in 
the wheat germ lysate . Densitometric analysis of the 
distribution of radiolabel in the polypeptides in lane 3 
showed that over 90% of the [CAT2AGUS] translation 
product was cleaved to [CAT -2 A] and GUS. 
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EXAMPLE 3 Expr ession of rCAT2AGUSl polvprotein in 
transgenic tobacco . 

The [CAT2AGUS] portion of pCAT2AGUS was excised, 
inserted between the 3 5S CaMV promoter and nos 3' 
terminator and cloned into a Binl9-derived plant 
expression vector containing a functional neomycin 
phosphotransferase gene. This construct was introduced 
into tobacco via Agrobacterium- mediat-oH leaf disk 
transformation. Transformed plants were selected and 
rooted on kanamycin. Protein extracts were prepared from 
leaves of transformed plants and assayed for GUS and CAT 
activities. The results are shown in Figure 4. A high 
degree of correlation was found between the activities of 
the two enzymes in any given plant. Of 18 independent 
15 transformed plants, seven plants (6, 26, 28, 24, 29, 1 

and 12) expressed both enzymes at relatively high levels. 
Four plants (2, 14, 15, 16) expressed both enzymes at low 
levels. The remaining plants (only three shown in Figure 
4, plants 4, 7 and 25) did not express either enzyme at 
20 levels above that detected in untransf ormed control 

plants (A, B, D, E, F) . 

Western blots of callus and leaf extracts were probed 
with anti-GUS antibodies. A major immuno- reactive 
product was detected at the expected molecular weight for 
25 GAS (70kDa) in plants transformed with [CAT2AGUS] but not 

in control plants. This confirmed that [CAT2AGUS] 
expressed in plant tissue is processed to [CAT2A] and 
GUS . 

The data presented here demonstrates that the FMDV 2A 
3 0 sequence can function in a manner similar to that 

observed in FMDV polyprotein processing when expressed in 
chimeric genes in plant extracts and cells. Both in wheat 
germ lysates and transgenic plants, the chimeric 
[CAT2AGUS] polyprotein undergoes rapid, apparently 
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autoproteolytic, cleavage to yield [CAT - 2 A] and GUS. 

FMDV 2A may represent an entirely novel type of 
protein cleavage activity. Although the mechanism of 
cleavage is not yet understood, its utility is apparent. 

5 Because of it's small size the FMDV 2A sequence or 

derivatives of it are particularly attractive candidates 
for use in plant polyprotein expression constructs 
enabling the coordinated and stoichiometric expression of 
multiple proteins from a single open reading frame. 

10 In the model [CAT2AGUS] construct described above, 

the 2 0 amino acid 2A- spanning sequence which remains 
attached to CAT after cleavage from GUS, is sufficiently 
short that CAT activity is not impaired. We have also 
shown no impairment of activity in similar constructs 

15 where 2A is attached to the carboxy terminus of 

glutathione-S-transf erase or T7 RNA polymerase. 

The FMDV 2A sequence inserted between protein coding 
sequences, allows the engineering of plant expression 
vectors where multiple whole proteins or protein domains 

2 0 can be expressed as a polyprotein and cleaved apart co- 

translationally with high efficiency. This may enable the 
rapid introduction of entire enzyme cascades into plants. 
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1 5 10 15 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Leu Lys Leu Ala Gly. Asp Val Glu Ser Asn Pro Gly Pro 
1 5 10 
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CLAIMS 

A method for the expression of multiple proteins 
in a transgenic plant comprising inserting into the 
genome of the plant a gene construct comprising a 5'- 
region which includes a promoter which is capable of 
initiating transcription of a structural gene under 
the control thereof, a protein encoding sequence 
coding for more than one protein and a 3 ' -terminator 
region which includes a polyadenylation signal, each 
of the said protein encoding sequences being 
separated from an adjacent protein encoding sequence 
by a DNA sequence which on translation provides a 
cleavage site whereby the expressed polyprotein is 
post-translationally processed into the component 
protein molecules. 

2. A method as claimed in claim 1, in which the DNA 
sequence which encodes the post- translation cleavage 
site is derived from a virus, particularly a 
picornavirus . 

3. A method as claimed in claim 1, in which the DNA 
sequence providing the cleavage site encodes the 
amino acid sequence NFDLLKLAGDVESNPGPFF [SEQ ID 

NO. 1] . 

4. A gene construct for the use in the genetic 
modification of plants comprising in sequence, a gene 
promoter active in plant cells, a plurality of coding 
regions and a 3 ' -non- translated region containing a 
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polyadenylation signal, characterised in the each of 
the plurality of coding regions is separated by a DNA 
sequence which on translation provides a cleavage 
site whereby the expressed polyprotein is post- 
translationally processed into the component protein 



A gene construct as claimed in claim 4, in which 
the cleavage site has the amino acid sequence 
NFDLLKLAGDVESNPGPFF [SEQ ID NO.l]. 
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